Also for k-means: more data does not imply better performance
نویسندگان
چکیده
Abstract Arguably, a desirable feature of learner is that its performance gets better with an increasing amount training data, at least in expectation. This issue has received renewed attention recent years and some curious surprising findings have been reported on. In essence, these results show more data does actually not necessarily lead to improved performance—worse even, can deteriorate. Clustering, however, subjected such kind study up now. paper shows k -means clustering, ubiquitous technique machine learning mining, suffers from the same lack so-called monotonicity display deterioration expected set sizes. Our main, theoretical contributions prove 1-means clustering monotonic, while 2-means even weakly i.e., occurrence nonmonotonic behavior persists indefinitely, beyond any sample size. For larger , question remains open.
منابع مشابه
Does more data always yield better translations?
Nowadays, there are large amounts of data available to train statistical machine translation systems. However, it is not clear whether all the training data actually help or not. A system trained on a subset of such huge bilingual corpora might outperform the use of all the bilingual data. This paper studies such issues by analysing two training data selection techniques: one based on approxima...
متن کاملSuperintelligence Does Not Imply Benevolence
Asmachines become capable ofmore autonomous and intelligent behavior, will they also display more morally desirable behavior? Earth’s history tends to suggest that increasing intelligence, knowledge, and rationality will result in more cooperative and benevolent behavior. Animals with sophisticated nervous systems track and punish exploitative behavior, while rewarding cooperation. Humans form ...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملCount(q) Does Not Imply Count(p)
I solve a conjecture originally studied by M. Ajtai. It states that for different primes q, p the matching principles Count(q) and Count(p) are logically independent. I prove that this indeed is the case. Actually I show that Count(q) implies Count(p) exactly when each prime factor in p also is a factor in q. 1 The logic of elementary counting “She loves me, she loves me not, she loves me,. . ....
متن کاملDoes Level-k Behavior Imply Level-k Thinking?
I design an experiment to interpret the observed Lk behavior. It distinguishes between the “Lkb” players, who have high ability and best respond to Lk belief, and the “Lka” players, who could use, at most, k steps of reasoning, and thus could not respond to L(k+1) or higherorder belief. The separation utilizes a combination of simultaneous and sequential ring games. In the sequential games it r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2023
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-023-06361-6